Skip to content

[FA4] Update flash-attention to latest upstream FA4#38690

Merged
LucasWilkinson merged 2 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/update-fa4
Apr 2, 2026
Merged

[FA4] Update flash-attention to latest upstream FA4#38690
LucasWilkinson merged 2 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/update-fa4

Conversation

@LucasWilkinson
Copy link
Copy Markdown
Collaborator

Testing PR for updating FA4 to latest upstream

Point vllm_flash_attn.cmake to updated FA branch (95e93d2) which
syncs flash_attn/cute/ with upstream Dao-AILab/flash-attention.
Bump nvidia-cutlass-dsl>=4.4.2 and quack-kernels>=0.3.3 to match
upstream FA4 requirements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the vllm-flash-attn Git tag to a newer commit and bumps the minimum versions for nvidia-cutlass-dsl and quack-kernels in the CUDA requirements file. I have no feedback to provide.

@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 1, 2026
@MatthewBonanni
Copy link
Copy Markdown
Collaborator

This will fix #36763 thanks to the inclusion of Dao-AILab/flash-attention@0293155

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Copy link
Copy Markdown
Collaborator

@MatthewBonanni MatthewBonanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Apr 2, 2026
@LucasWilkinson LucasWilkinson changed the title [WIP][Do not merge yet] Update flash-attention to latest upstream FA4 [FA4] Update flash-attention to latest upstream FA4 Apr 2, 2026
@LucasWilkinson LucasWilkinson enabled auto-merge (squash) April 2, 2026 14:37
@LucasWilkinson LucasWilkinson merged commit cb3935a into vllm-project:main Apr 2, 2026
139 of 140 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Apr 2, 2026
mieshkiwrk pushed a commit to mieshkiwrk/vllm that referenced this pull request Apr 2, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai>
yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 3, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Apr 6, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
askliar pushed a commit to netanel-haber/vllm that referenced this pull request Apr 7, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
askliar pushed a commit to netanel-haber/vllm that referenced this pull request Apr 7, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
askliar pushed a commit to netanel-haber/vllm that referenced this pull request Apr 7, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
USTCKAY pushed a commit to USTCKAY/vllm that referenced this pull request Apr 7, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Song Kai <songkai05@baidu.com>
rishitdholakia13 pushed a commit to rishitdholakia13/vllm that referenced this pull request Apr 7, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Rishi Puri <riship@nvidia.com>
big-yellow-duck pushed a commit to EmbeddedLLM/vllm that referenced this pull request Apr 8, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
jackcfwang pushed a commit to jackcfwang/vllm that referenced this pull request Apr 10, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: jackcfwang <jackcfwang@tencent.com>
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 10, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
c2w-sea added a commit to coreweave/ml-containers that referenced this pull request Apr 10, 2026
- vLLM: v0.19.0 -> v0.19.1rc0
- FlashInfer: v0.6.6 -> v0.6.7

v0.19.1rc0 is the first release with the FA4 NaN fix (PR #38690)
for Blackwell/SM100 and TRTLLM as default MLA prefill backend.
Eliminates the need for Dockerfile-level FA4 patching.

Refs: vllm-project/vllm#38690, INF-353
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants